Counting words
On the final exam you will need to count the occurrences of a string within a longer string. For this example you will be counting words in a famous poem about Python. You will also learn about several handy built-in packages (libraries) for working with natural (human) language text, like poems.
>>> import this
>>> this.s
"Gur Mra bs Clguba, ol Gvz Crgref\n\n... bs gubfr!"
Python core developers encoded their poem in ROT-13, where they rotated the English alphabet by 13 characters.
import codecs
from collections import Counter
poem = codecs.decode(this.s, 'rot_13')
counts = Counter(poem)
To create a word counter you need to split the text into words, but you better get rid of punctuation, so when you split the string on whitespace you don’t end up with tokens or words like “Python,” or “Python!” And you probably want to lowercase all the words so that you count “Python” as the same word as “python”.
>>> import string
>>> for c in string.punctuation:
... poem.replace(c, ' ')
>>> words = poem.lower().split()
Now you can count the words in the “The Zen of Python” using the accumulator pattern:
>>> better_count = 0
>>> for w in words:
... if w == 'better':
... better_count += 1
... print(better_count)
There’s a Counter
class in the built-in collections
package that can count the objects in any list and return the counts as values in a dictionary:
>>> from collections import Counter
>>> counts = Counter(words)
>>> better_count = counts['better']
>>> print(better_count)